Boosted Classification Trees and Class Probability/Quantile Estimation
نویسندگان
چکیده
The standard by which binary classifiers are usually judged, misclassification error, assumes equal costs of misclassifying the two classes or, equivalently, classifying at the 1/2 quantile of the conditional class probability function P[y = 1|x]. Boosted classification trees are known to perform quite well for such problems. In this article we consider the use of standard, off-the-shelf boosting for two more general problems: 1) classification with unequal costs or, equivalently, classification at quantiles other than 1/2, and 2) estimation of the conditional class probability function P[y = 1|x]. We first examine whether the latter problem, estimation of P[y = 1|x], can be solved with LogitBoost, and with AdaBoost when combined with a natural link function. The answer is negative: both approaches are often ineffective because they overfit P[y = 1|x] even though they perform well as classifiers. A major negative point of the present article is the disconnect between class probability estimation and classification. Next we consider the practice of over/under-sampling of the two classes. We present an algorithm that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data (“JOUS-Boost”). This algorithm is simple, yet successful, and it preserves the advantage of relative protection against overfitting, but for arbitrary misclassification costs and, equivalently, arbitrary quantile boundaries. We then use collections of classifiers obtained from a grid of quantiles to form estimators of class probabilities. The estimates of the class probabilities compare favorably to those obtained by a variety of methods across both simulated and real data sets.
منابع مشابه
A Study of Probability Estimation Techniques for Rule Learning
Rule learning is known for its descriptive and therefore comprehensible classification models which also yield good class predictions. However, in some application areas, we also need good class probability estimates. For different classification models, such as decision trees, a variety of techniques for obtaining good probability estimates have been proposed and evaluated. However, so far, th...
متن کاملAn Inverse-Quantile Function Approach for Modeling Electricity Price
We propose a class of alternative stochastic volatility models for electricity prices using the quantile function modeling approach. Specifically, we fit marginal distributions of power prices to two special classes of distributions by matching the quantile of an empirical distribution to that of a theoretical distribution. The distributions from the first class have closed-form formulas for pr...
متن کاملImproved Class Probability Estimates from Decision Tree Models
Decision tree models typically give good classification decisions but poor probability estimates. In many applications, it is important to have good probability estimates as well. This paper introduces a new algorithm, Bagged Lazy Option Trees (B-LOTs), for constructing decision trees and compares it to an alternative, Bagged Probability Estimation Trees (B-PETs). The quality of the class proba...
متن کاملHigh Quantile Estimation and the Port Methodology
• In many areas of application, a typical requirement is to estimate a high quantile χ1−p of probability 1−p, a value, high enough, so that the chance of an exceedance of that value is equal to p, small. The semi-parametric estimation of high quantiles depends not only on the estimation of the tail index γ, the primary parameter of extreme events, but also on an adequate estimation of a scale f...
متن کاملAn efficient model-free estimation of multiclass conditional probability
Conventional multiclass conditional probability estimation methods, such as Fisher’s discriminate analysis and logistic regression, often require restrictive distributional model assumption. In this paper, a model-free estimation method is proposed to estimate multiclass conditional probability through a series of conditional quantile regression functions. Specifically, the conditional class pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 8 شماره
صفحات -
تاریخ انتشار 2007